Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction

نویسندگان

  • Zhipeng Cai
  • Randy Goebel
  • Mohammad R. Salavatipour
  • Yi Shi
  • Lizhe Xu
  • Guohui Lin
چکیده

One of the main applications of microarray technology is to determine the gene expression profiles of diseases and disease treatments. This is typically done by selecting a small number of genes from amongst thousands to tens of thousands, whose expression values are collectively used as classification profiles. This gene selection process is notoriously challenging because microarray data normally contains only a very small number of samples, but range over thousands to tens of thousands of genes. Most existing gene selection methods carefully define a function to score the differential levels of gene expression under a variety of conditions, in order to identify top-ranked genes. Such single gene scoring methods suffer because some selected genes have very similar expression patterns so using them all in classification is largely redundant. Furthermore, these selected genes can prevent the consideration of other individually-less but collectively-more differentially expressed genes. We propose to cluster genes in terms of their class discrimination strength and to limit the number of selected genes per cluster. By combining this idea with several existing single gene scoring methods, we show by experiments on two cancer microarray datasets that our methods identify gene subsets which collectively have significantly higher classification accuracies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gene Selection for Multi-Class Prediction of Microarray Data

Gene expression data from microarrays have been successfully applied to class prediction, where the purpose is to classify and predict the diagnostic category of a sample by its gene expression profile. A typical microarray dataset consists of expression levels for a large number of genes on a relatively small number of samples. As a consequence, one basic and important question associated with...

متن کامل

Effect of Temperature and Time on the Joint Properties of AISI420 Steel to SAF2507 Steel Produced by Transient Liquid Phase Process

In this research, the effect of temperature and time on the properties of AISI420/SAF2507 dissimilar joint produced by transient liquid phase bonding process was investigated. A BNi-2 interlayer with 25 μm thickness was inserted between two dissimilar steel samples. The bonding process was performed at 1050 oC and 1100 oC for different bonding times. The microstructures of the joints were studi...

متن کامل

Equivalence class formation via identity matching to sample and simple discrimination with class-specific consequences

Human participant performances often show evidence of learning untrained relations when conditional discrimination training between physically dissimilar stimuli is conducted. These emergent relations document equivalence class formation. The current study investigated whether class-specific consequences (i.e. the specific reinforcers used for each potential class during training) also join the...

متن کامل

Dissimilar resistance spot welding of AISI 1075 eutectoid steel to AISI 201 stainless steel

In this paper, dissimilar resistance spot welding of AISI 1075 eutectoid steel to AISI 201 stainless steel is investigated experimentally. For this purpose, the experiments are designed using response surface methodology and based on four-factor, five-level central composite design. The effects of process parameters such as welding current, welding time, cooling time and electrode force are inv...

متن کامل

Comparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study

Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007